A Stochastic Morphological Analysis for Japanese employing Character n-Gram and k-NN Method

نویسنده

Kenji Nagamatsu

چکیده

Because Japanese corpora have been developed recently, it has become possible to perform stochastic morphological analysis for Japanese(Nagata, 1994; Takeuchi and Matsumoto, 1995; Mori and Nagao, 1996; Yamamoto et al., 1997). Although the same Hidden Markov Model-based approach as English can be fundamentally applicable with word/part-of-speech n-gram data, some problems peculiar to Japanese make the approach indirect. Before calculating the most likely part-ofspeech(abbreviated to 'pos') sequence, it is required to segment input sentences into morphemes referring to word dictionaries.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Japanese Unknown Word Identification by Character-based Chunking

We introduce a character-based chunking for unknown word identification in Japanese text. A major advantage of our method is an ability to detect low frequency unknown words of unrestricted character type patterns. The method is built upon SVM-based chunking, by use of character n-gram and surrounding context of n-best word segmentation candidates from statistical morphological analysis as feat...

متن کامل

h . R ep or t T R 99 - 1 75 6 Unsupervised Statistical Segmentation of Japanese Kanji

Word segmentation is an important issue in Japanese language processing because Japanese is written without space delimiters between words. We propose a simple dictionary-less method to segment Japanese kanji sequences into words based solely on character n-gram counts from an unannotated corpus. The performance was often better than that of rule-based morphological analyzers over a variety of ...

متن کامل

T R 99 - 1 75 6 Unsupervised Statistical Segmentation of Japanese Kanji Strings

متن کامل

The Design of a Nearest-Neighbor Classi er and Its Use for Japanese Character Recognition

The nearest neighbor (NN) approach is a powerful nonparametric technique for pattern classi cation tasks. Although the brute-force NN algorithm is simple and has high accuracy, its computation cost is usually very expensive, especially for applications such as Japanese character recognition in which the number of categories is large. Many methods have been proposed to improve the efciency of NN...

متن کامل

The design of a nearest-neighbor classifier and its use for Japanese character recognition

The nearest neighbor (NN) approach is a powerfd nonparametric technique for pattern classification tasks. In this paper, algorithms for prototype reduction, hierarchical prototype organization and fast NN search are described. To remove redundant category prototypes and to avoid redundant comparisons, the algorithms exploit geometrical information of a given prototype set which is represented a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1997

A Stochastic Morphological Analysis for Japanese employing Character n-Gram and k-NN Method

نویسنده

چکیده

منابع مشابه

Japanese Unknown Word Identification by Character-based Chunking

h . R ep or t T R 99 - 1 75 6 Unsupervised Statistical Segmentation of Japanese Kanji

T R 99 - 1 75 6 Unsupervised Statistical Segmentation of Japanese Kanji Strings

The Design of a Nearest-Neighbor Classi er and Its Use for Japanese Character Recognition

The design of a nearest-neighbor classifier and its use for Japanese character recognition

عنوان ژورنال:

اشتراک گذاری